176 resultados para Mineração de dados (Computação)

em Universidade Federal do Rio Grande do Norte(UFRN)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, telecommunications is one of the most dynamic and strategic areas in the world. Organizations are always seeking to find new management practices within an ever increasing competitive environment where resources are getting scarce. In this scenario, data obtained from business and corporate processes have even greater importance, although this data is not yet adequately explored. Knowledge Discovery in Databases (KDD) appears then, as an option to allow the study of complex problems in different areas of management. This work proposes both a systematization of KDD activities using concepts from different methodologies, such as CRISP-DM, SEMMA and FAYYAD approaches and a study concerning the viability of multivariate regression analysis models to explain corporative telecommunications sales using performance indicators. Thus, statistical methods were outlined to analyze the effects of such indicators on the behavior of business productivity. According to business and standard statistical analysis, equations were defined and fit to their respective determination coefficients. Tests of hypotheses were also conducted on parameters with the purpose of validating the regression models. The results show that there is a relationship between these development indicators and the amount of sales

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A fragilidade brasileira quanto à competitividade turística é um fato observável nos dados da Organização Mundial do Turismo. O Brasil caiu em 2011, da 45ª para a 52ª posição, apesar de liderar no atributo recursos naturais e estar colocado na 23° em recursos culturais. Assim, grandes interesses e esforços têm sido direcionados para o estudo da competitividade dos produtos e destinos turísticos. O destino turístico é caracterizado por um conjunto complexo e articulado de fatores tangíveis e intangíveis, apresentando alta complexidade, dados de elevada dimensionalidade, não linearidade e comportamento dinâmico, tornando-se difícil a modelagem desses processos por meio de abordagens baseadas em técnicas estatísticas clássicas. Esta tese investigou modelos de equações estruturais e seus algoritmos, aplicados nesta área, analisando o ciclo completo de análise de dados, em um processo confirmatório no desenvolvimento e avaliação de um modelo holístico da satisfação do turista; na validação da estrutura do modelo de medida e do modelo estrutural, por meio de testes de invariância de múltiplos grupos; na análise comparativa dos métodos de estimação MLE, GLS e ULS para a modelagem da satisfação e na realização de segmentação de mercado no setor de destino turístico utilizando mapas auto-organizáveis de Kohonen e sua validação com modelagem de equações estruturais. Aplicações foram feitas em análises de dados no setor de turismo, principal indústria de serviços do Estado do Rio Grande do Norte, tendo sido, teoricamente desenvolvidos e testados empiricamente, modelos de equações estruturais em padrões comportamentais de destino turístico. Os resultados do estudo empírico se basearam em pesquisas com a técnica de amostragem aleatória sistemática, efetuadas em Natal-RN, entre Janeiro e Março de 2013 e forneceram evidências sustentáveis de que o modelo teórico proposto é satisfatório, com elevada capacidade explicativa e preditiva, sendo a satisfação o antecedente mais importante da lealdade no destino. Além disso, a satisfação é mediadora entre a geração da motivação da viagem e a lealdade do destino e que os turistas buscam primeiro à satisfação com a qualidade dos serviços de turismo e, posteriormente, com os aspectos que influenciam a lealdade. Contribuições acadêmicas e gerenciais são mostradas e sugestões de estudo são dadas para trabalhos futuros.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Currently, one of the biggest challenges for the field of data mining is to perform cluster analysis on complex data. Several techniques have been proposed but, in general, they can only achieve good results within specific areas providing no consensus of what would be the best way to group this kind of data. In general, these techniques fail due to non-realistic assumptions about the true probability distribution of the data. Based on this, this thesis proposes a new measure based on Cross Information Potential that uses representative points of the dataset and statistics extracted directly from data to measure the interaction between groups. The proposed approach allows us to use all advantages of this information-theoretic descriptor and solves the limitations imposed on it by its own nature. From this, two cost functions and three algorithms have been proposed to perform cluster analysis. As the use of Information Theory captures the relationship between different patterns, regardless of assumptions about the nature of this relationship, the proposed approach was able to achieve a better performance than the main algorithms in literature. These results apply to the context of synthetic data designed to test the algorithms in specific situations and to real data extracted from problems of different fields

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The opening of the Brazilian market of electricity and competitiveness between companies in the energy sector make the search for useful information and tools that will assist in decision making activities, increase by the concessionaires. An important source of knowledge for these utilities is the time series of energy demand. The identification of behavior patterns and description of events become important for the planning execution, seeking improvements in service quality and financial benefits. This dissertation presents a methodology based on mining and representation tools of time series, in order to extract knowledge that relate series of electricity demand in various substations connected of a electric utility. The method exploits the relationship of duration, coincidence and partial order of events in multi-dimensionals time series. To represent the knowledge is used the language proposed by Mörchen (2005) called Time Series Knowledge Representation (TSKR). We conducted a case study using time series of energy demand of 8 substations interconnected by a ring system, which feeds the metropolitan area of Goiânia-GO, provided by CELG (Companhia Energética de Goiás), responsible for the service of power distribution in the state of Goiás (Brazil). Using the proposed methodology were extracted three levels of knowledge that describe the behavior of the system studied, representing clearly the system dynamics, becoming a tool to assist planning activities

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Traditional applications of feature selection in areas such as data mining, machine learning and pattern recognition aim to improve the accuracy and to reduce the computational cost of the model. It is done through the removal of redundant, irrelevant or noisy data, finding a representative subset of data that reduces its dimensionality without loss of performance. With the development of research in ensemble of classifiers and the verification that this type of model has better performance than the individual models, if the base classifiers are diverse, comes a new field of application to the research of feature selection. In this new field, it is desired to find diverse subsets of features for the construction of base classifiers for the ensemble systems. This work proposes an approach that maximizes the diversity of the ensembles by selecting subsets of features using a model independent of the learning algorithm and with low computational cost. This is done using bio-inspired metaheuristics with evaluation filter-based criteria

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Educational Data Mining is an application domain in artificial intelligence area that has been extensively explored nowadays. Technological advances and in particular, the increasing use of virtual learning environments have allowed the generation of considerable amounts of data to be investigated. Among the activities to be treated in this context exists the prediction of school performance of the students, which can be accomplished through the use of machine learning techniques. Such techniques may be used for student’s classification in predefined labels. One of the strategies to apply these techniques consists in their combination to design multi-classifier systems, which efficiency can be proven by results achieved in other studies conducted in several areas, such as medicine, commerce and biometrics. The data used in the experiments were obtained from the interactions between students in one of the most used virtual learning environments called Moodle. In this context, this paper presents the results of several experiments that include the use of specific multi-classifier systems systems, called ensembles, aiming to reach better results in school performance prediction that is, searching for highest accuracy percentage in the student’s classification. Therefore, this paper presents a significant exploration of educational data and it shows analyzes of relevant results about these experiments.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Soft skills and teamwork practices were identi ed as the main de ciencies of recent graduates in computer courses. This issue led to a realization of a qualitative research aimed at investigating the challenges faced by professors of those courses in conducting, monitoring and assessing collaborative software development projects. Di erent challenges were reported by teachers, including di culties in the assessment of students both in the collective and individual levels. In this context, a quantitative research was conducted with the aim to map soft skill of students to a set of indicators that can be extracted from software repositories using data mining techniques. These indicators are aimed at measuring soft skills, such as teamwork, leadership, problem solving and the pace of communication. Then, a peer assessment approach was applied in a collaborative software development course of the software engineering major at the Federal University of Rio Grande do Norte (UFRN). This research presents a correlation study between the students' soft skills scores and indicators based on mining software repositories. This study contributes: (i) in the presentation of professors' perception of the di culties and opportunities for improving management and monitoring practices in collaborative software development projects; (ii) in investigating relationships between soft skills and activities performed by students using software repositories; (iii) in encouraging the development of soft skills and the use of software repositories among software engineering students; (iv) in contributing to the state of the art of three important areas of software engineering, namely software engineering education, educational data mining and human aspects of software engineering.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Soft skills and teamwork practices were identi ed as the main de ciencies of recent graduates in computer courses. This issue led to a realization of a qualitative research aimed at investigating the challenges faced by professors of those courses in conducting, monitoring and assessing collaborative software development projects. Di erent challenges were reported by teachers, including di culties in the assessment of students both in the collective and individual levels. In this context, a quantitative research was conducted with the aim to map soft skill of students to a set of indicators that can be extracted from software repositories using data mining techniques. These indicators are aimed at measuring soft skills, such as teamwork, leadership, problem solving and the pace of communication. Then, a peer assessment approach was applied in a collaborative software development course of the software engineering major at the Federal University of Rio Grande do Norte (UFRN). This research presents a correlation study between the students' soft skills scores and indicators based on mining software repositories. This study contributes: (i) in the presentation of professors' perception of the di culties and opportunities for improving management and monitoring practices in collaborative software development projects; (ii) in investigating relationships between soft skills and activities performed by students using software repositories; (iii) in encouraging the development of soft skills and the use of software repositories among software engineering students; (iv) in contributing to the state of the art of three important areas of software engineering, namely software engineering education, educational data mining and human aspects of software engineering.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The relevance of rising healthcare costs is a main topic in complementary health companies in Brazil. In 2011, these expenses consumed more than 80% of the monthly health insurance in Brazil. Considering the administrative costs, it is observed that the companies operating in this market work, on average, at the threshold between profit and loss. This paper presents results after an investigation of the welfare costs of a health plan company in Brazil. It was based on the KDD process and explorative Data Mining. A diversity of results is presented, such as data summarization, providing compact descriptions of the data, revealing common features and intrinsic observations. Among the key findings was observed that a small portion of the population is responsible for the most demanding of resources devoted to health care

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The progresses of the Internet and telecommunications have been changing the concepts of Information Technology IT, especially with regard to outsourcing services, where organizations seek cost-cutting and a better focus on the business. Along with the development of that outsourcing, a new model named Cloud Computing (CC) evolved. It proposes to migrate to the Internet both data processing and information storing. Among the key points of Cloud Computing are included cost-cutting, benefits, risks and the IT paradigms changes. Nonetheless, the adoption of that model brings forth some difficulties to decision-making, by IT managers, mainly with regard to which solutions may go to the cloud, and which service providers are more appropriate to the Organization s reality. The research has as its overall aim to apply the AHP Method (Analytic Hierarchic Process) to decision-making in Cloud Computing. There to, the utilized methodology was the exploratory kind and a study of case applied to a nationwide organization (Federation of Industries of RN). The data collection was performed through two structured questionnaires answered electronically by IT technicians, and the company s Board of Directors. The analysis of the data was carried out in a qualitative and comparative way, and we utilized the software to AHP method called Web-Hipre. The results we obtained found the importance of applying the AHP method in decision-making towards the adoption of Cloud Computing, mainly because on the occasion the research was carried out the studied company already showed interest and necessity in adopting CC, considering the internal problems with infrastructure and availability of information that the company faces nowadays. The organization sought to adopt CC, however, it had doubt regarding the cloud model and which service provider would better meet their real necessities. The application of the AHP, then, worked as a guiding tool to the choice of the best alternative, which points out the Hybrid Cloud as the ideal choice to start off in Cloud Computing. Considering the following aspects: the layer of Infrastructure as a Service IaaS (Processing and Storage) must stay partly on the Public Cloud and partly in the Private Cloud; the layer of Platform as a Service PaaS (Software Developing and Testing) had preference for the Private Cloud, and the layer of Software as a Service - SaaS (Emails/Applications) divided into emails to the Public Cloud and applications to the Private Cloud. The research also identified the important factors to hiring a Cloud Computing provider

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this thesis is proposes a method for a mobile robot to build a hybrid map of an indoor, semi-structured environment. The topological part of this map deals with spatial relationships among rooms and corridors. It is a topology-based map, where the edges of the graph are rooms or corridors, and each link between two distinct edges represents a door. The metric part of the map consists in a set of parameters. These parameters describe a geometric figure which adapts to the free space of the local environment. This figure is calculated by a set of points which sample the boundaries of the local free space. These points are obtained with range sensors and with knowledge about the robot s pose. A method based on generalized Hough transform is applied to this set of points in order to obtain the geomtric figure. The building of the hybrid map is an incremental procedure. It is accomplished while the robot explores the environment. Each room is associated with a metric local map and, consequently, with an edge of the topo-logical map. During the mapping procedure, the robot may use recent metric information of the environment to improve its global or relative pose

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This graduate thesis proposes a model to asynchronously replicate heterogeneous databases. This model singularly combines -in a systematic way and in a single project -different concepts, techniques and paradigms related to the areas of database replication and management of heterogeneous databases. One of the main advantages of the replication is to allow applications to continue to process information, during time intervals when they are off the network and to trigger the database synchronization, as soon as the network connection is reestablished. Therefore, the model introduces a communication and update protocol that takes in consideration the environment of asynchronous characteristics used. As part of the work, a tool was developed in Java language, based on the model s premises in order to process, test, simulate and validate the proposed model